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ABSTRACT 

We consider two tree-based indexing schemes that are widely used in practical systems as the basis 
for both primary and secondary key indexing. We define B-tree and its features, advantages, 
disadvantages of B-tree. The difference between B + -tree and B-tree has also been discussed. We show 
the algorithm, examples and figures in the context of B* -tree. 
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I. INTRODUCTION 

In a tree based indexing scheme the search generally starts at the root node. Depending on the 
conditions that are satisfied at the node under examination, a branch is made to one of several nodes, and the 
procedure is repeated until we find a match or encounter a leaf node. 

Tree Schemes: Each node of the tree except the leaf nodes can be considered to consist of the following 
information: 

[n, T;i, kji, T i2 , k i2 , , T in , k in , T i(n+ i)] 

Where the k y 's are key values and the Ty's are pointers. For an m-order tree the following conditions are true: 

> n < m 

> k il <k i2 <....<k in 

> each of the pointers, Ty, 1 <j < (n+1), points to a sub tree containing values less than ky 
and greater than or equal to k^i). 

The leaf nodes of the B + -tree are quite similar to the non leaf nodes, except that the pointers in the leaf nodes do 
not point to subtrees. The pointers T L j, 1 < j < n, in the leaf nodes point to storage areas containing either records 
having a key value ky, or pointers to records, each of which has a key value ky. The number of key values in 
each leaf node is at least [(m - l)/2] and at most m-1. 

The pointer T L(n+ i) is used to chain the leaf nodes in a sequential order. This allows for sequential processing of 
the underlying file of records. The following conditions are satisfied by the nodes of a B + -tree :- 

> The height of the tree is > 1 . 

> The root has at least two children. 

> All nodes other than the root node and the leaf nodes have at least [m/2] children, where m is 
the order of the tree. 

> All leaf nodes are at the same level. 



Operations : All operations of B + -tree require access to the leaf nodes. 



Search : The number of nodes accessed is equal to the height of the tree. Once the required leaf node is reached, 
we can retrieve the pointer for the storage location containing the records; knowing the storage location, we can 
retrieve the required records. 

Insertion : We assume that the records themselves would be inserted in the pertinent storage locations. Insertion 
and deletion that violates the conditions on the number of keys in a node requires the redistribution of keys 
among a node, its sibling and their parent. If after insertion of the key, the node has more than m-1 keys, the 
node is said to overflow. Overflow cam be handled by redistribution if the number of entries in the left or right 
sibling of the node is less than the maximum. 



Deletion : The leaf node containing the key to be deleted is found and the key entry in the node deleted. If the 
resultant node is empty or has fewer than [(m-l)/2] keys, 
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> The data from the sibling nodes could be redistributed, i.e., the sibling has more than the minimum 
number of keys and one of these keys is enough to bring the number of keys in node TD to be equal to 
[(m-l)/2] 

> The node TD is merged with the sibling to become a single node. This is possible if the sibling has only 
the minimum number of keys. The merger of the two nodes would still make the number of keys in the 
new nodes less than the maximum. 

Capacity : The upper and lower limits of the capacity of a B + -tree of order m may be calculated by considering 
each node of the tree to be maximally (m-1) keys or minimally full ([m/2]-l)keys. Let the height of the tree is h. 
As every key must occur in the leaf node and the leaf nodes may also contain a minimum of [(m-l)/2] and a 
maximum of (m-1) keys we have 

2 * [(m-l)/2 * [m/2] h - 2 < N < (m-1) * m h -' 

B-Tree : ( Balanced Tree) The basic B-tree structure has growth to become one of the most popular techniques 
for organizing an index structure while accessing the records using such structure, several conditions of the tree 
must be true, to reduce disk access: 

> The Height of the tree must be kept to a minimum. 

> These must be no empty sub trees above the leaves of the tree. 

> The leaves of the tree must be at the same level. 

> All nodes except the leaves must have as few as two and as many as the maximum 
number of children. 

The Features of a B-tree:- 

> There is no redundant storage of search key values i.e., B-tree stores each search key 
value in only one node, which may contain other search key values. 

y The B-tree is inherently balanced and is ordered by only one type of search key. 

y The insertion and deletion operations are complex with the time complexity 0(log 2 n). 

y The number of keys in the nodes is not always the same. The storage management is only 

complicated if you choose to create more space for pointers keys otherwise the size of a 

node is fixed. 

> The B-tree grows at the node as opposed to the Binary tree, BST and AVL trees. 

y For a B-tree of order N with n nodes, the height is log n. The height of B-tree increases 

only because of a split at the root node. 
Advantages of B-tree indexes:- 

y There is no overflow problem inherent with the type of organization it is good for 

dynamic table- those that suffer a great deal of insert / update / delete activity. 
y Because it is so large extent self-maintaining, it is good in supporting 24 hours operation. 

y As data is retrieved by the index, it is always presented in order. 

y 'Get next' queries are efficient because of the inherent ordering of rows within the index 

blocks. 

y B-tree indexes are good for every large tables because they will need minimal 

reorganization. 

y There is predictable access time for any retrieval because the B-tree structure keeps itself 

balanced, so that there is always the same number of index levels does increases both 
with the number of records and the length of the key value. 
Disadvantages of B-tree indexes:- 

■ For static tables, there are better organizations that require fewer I/Os. ISAM indexes are preferable to 
B-tree in this type of environment. 

■ B-tree is not really appropriate for every small table because index look-up becomes a significant part 
of the overall access time. 

■ The index can use considerable disk space, especially in products which allow different users to create 
separate indexes on the same table/ column combinations. 

■ Because the indexed themselves are subject to modification when rows are updated, deleted or inserted, 
they are also subject to locking which can inhabit concurrency. 

Difference between B + -tree and B-tree: - 

Retrieval of the next record is relatively easy in the B + -tree, this is not the case in the B-tree unless the 
internal nodes of the B-tree are linked in a sequential order. 
4- The deletions in a B + -tree are always made in the leaf nodes. In a B-tree, a value can be deleted from 
any node, making deletions more complicated than in a B + -tree. 
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i- Insertions in a B + -tree are always made in the leaf nodes. In the B-tree, insertions are made at the 
lowest non leaf node. Insertions (or deletions) may cause node splits and thereby affect the height of 
the tree in both cases. 

i- The capacity of the B-tree can be calculated in a manner similar to that used for the B + -tree. That the 
order of the tree is dictated by physical storage availability, among other factors. For the same buffer 
size, the order of the B-tree would be less than that of the B + -tree. 

II. ALGORITHM - SEARCHING B + -TREE 

K s , the search key 

Found,( a Boolean value), and 

A, the address of record if found 

{nodes content : [n, Ti, ki, T 2 , k 2 , T n , k n , T n+ i]k n+ i = oo is assumed) 

get root_node 
while not leaf_node do 
begin 
i : = 1 

while not ( i > n or K s < k ; ) do 
i: = i+l 

{T points to the subtree that may contain K s } 
get subtree Ti 
end { while not leaf_node } 
{ search leaf node for key KJ 

{content of leaf node : [n, Pi, ki, P 2 , k 2 , P„, k n , P„+i]} 

i : = 1 

found : = false 

while not (found or i > n ) do 
begin 

found : = K s = K ; 
if found then 
A:=P 
else i : = i + 1 
end { while not ( found or i > n) } 

III. EXAMPLES AND FIGURES 

Example 1: Given a file containing the following records- 



Books 


Subject Area 


2 


Files 


3 


Database 


4 


Artificial intelligence 


5 


Files 


7 


Discrete structures 


8 


Software engineering 


9 


Programming methodology 


40 


Operating system 


50 


Graphics 


51 


Database 


52 


Data structures 
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Figure 1: A B + - tree of order 4 on Book- Each P ; is a pointer to the storage area containing records for the key 
Books = i; _L represents a null pointer. 
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Example2: In the B + -tree of example 1, let us insert an entry for Books 1. The original contents of 
the leaf node (with the label PT 0 ) in which the key would be inserted are: 



PTn 



P 2 2 P 3 3 P 4 4 



-► PT X 



This nodes does not have a left sibling and the right sibling is already full. Hence, 
insertion of the key 1 would cause a split. Let the new node be PT N . The contents of 
these nodes are below: 
PT 0 



Pi 1 P 2 2 



P 3 3 P 4 4 



The pak < 3, PT N > are passes to the parent node for insertion as indicated: 



"> PT X 



PTn 3 PT N 5 T, 9 T, 15 T, 



The insertion causes a split of this node into the following two nodes with the key value 5, along with a 
pointer passed to the parent of the node: 



PT 0 3 PTn 5 



T, 9 T 2 15 T 3 
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Let the address of the new node be P Y . Then the pair < 5, Py > is passed to the parent node for insertion. 
Figure2: The B + - tree of examplel after insertion of the key for Booksl. 
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Example3: Let delete the entry for Books5 from the tree shown in examplel. The resultant tree is shown in part 

(i) of fogure3. 

Figure3: (i) The B + -tree that results after the deletion of key 5 from the tree of examplel. 

(ii) The B + -tree after the deletion of key 7. 
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20 



40 
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(ii) 



Figure 4: Capacity of a B + -tree 
Level 
1 



Number of nodes 
at level 
1 



m 



m 



h-l 




(a) 
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Level 



Number of nodes 
at level 
1 



m 



C 



(ZZ) 



2*[m/2] 



h-l 



1 



1 



• • • 

ZJ ~s S 



(b) 

Example5: A B + -tree for the following set of key values-(2,3,5,7,ll,17,19,23,29,31).That the number of search 
key values that fit in one node is (a) 3 and (b) 7. 
Figure: (a) 




(b) 




2 3 5 7 11 17 19 
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Example 6: Create a B-tree structure of the order 3 of the following relation. 
Customer 



C_No. 


Name 


Location 


CI 


Nl 


LI 


C2 


N2 


L2 


C9 


N3 


L2 


CIO 


N4 


L3 


Cll 


N5 


L3 


C15 


N6 


L3 


C19 


N7 


L4 


C23 


N8 


L3 


C25 


N9 


L4 


C37 


N10 


L2 


C32 


Nil 


L2 


C34 


N12 


LI 



Figure 6: 
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IV. CONCLUSION 

Tree-based data organization schemes are used both for primary and secondary key retrieval. The 
B + -tree scheme, each node of the tree except the leaf node contains a set of keys and pointers pointing to sub 
trees. The leaf nodes of the B + -tree are similar to the non leaf or internal nodes except that the pointers in the 
leaf node point directly or indirectly to storage areas containing the required records. We also examined the 
method of performing the search and update operations using the B + -tree and compared the B + -tree with the B- 
tree. 
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